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ABSTRACT 

We present a detailed strong lensing analysis of an HST/ACS legacy dataset for the first gravitational lens, 
Q0957+56L With deep imaging we identify 24 new strongly lensed features, which we use to constrain mass 
models. We model the stellar component of the lens galaxy using the observed luminosity distribution, and the 
dark matte r halo using several different density profiles. We draw on the weak lensing analysis by Nakaj ima 
et al.| ("2009 ) to constrain the mass sheet and environmental terms in the lens potential. Adopting the well- 
measured time delay, we find = ^S^l^ km s~^ Mpc~^ (68% CL) using lensing constraints alone. The principal 
uncertainties in Hq are tied to the stellar mass-to-light ratio (a variant of the radial profile degeneracy in lens 
models). Adding constraints from stellar population synthesis models, we obtain Hq = 79.3;^g-5 km s~^ Mpc~^ 
(68% CL). We infer that the lens galaxy has a rising rotation curve and a dark matter distribution with an inner 
core. Intriguingly, we find the quasar flux ratios predicted by our models to be inconsistent with existing radio 
measurements, suggesting the presence of substructure in the lens. 



1. INTRODUCTION 

Since its inception, strong gravitational lensing has been 
used to probe galaxy mass distributions in a way that com- 
plements photometric and dynamical studies. In systems with 
numerous lensed features, especially partial or full Einstein 
rings, lensing can provide a fairly det ailed description of the 
lens gala xy mass d istribution (e.g., Kochane k et al.||1989| 
2001; Kocha nek|199 5i Lehar et al. 1996, 19971 |Trott er"eraL 



2000; Koopm ans et al. 2006; G avazzi et al. 2008; Suyuet 
al. 2008^ 2009| . In systems with only a few lensed images. 



However, there may be significant systematic uncertainties in 
the lens mass distribution. For example, four-image lenses 
typically constrain t he angular structu re of the lens potential 



reasonably well (e.g., |Keeton et al.ll99 7), but often cannot de- 
termine the radial profile because the im ages lie at similar dis- 



tances from the center o f the galaxy (e.g., |Keeton & Kochanek 



1997[ |Kochanek||2002| ) . Two-image lenses are better able to 



probe the radial jprofile, but provide only poor constraints on 
the angular structure of the lens potential ^Rusin et al. [2 0031 
In addition, further complications may arise if the lens galaxy 
lies in a group or clust er of galaxies t hat significantly affect 
the lens potential (see Keeton & Zabludof f 2004 ). 

These issues explain the challenges associated with model- 
ing the first gravitational lens discovered, Q0957+561 ( Walsh" 
|et al. 1979). The two lensed images of the background quasar 
provide limited constraints on a lens potential that is com- 
plicated by the p resence of a clus ter surrounding the main 
lens galaxy (e.g., |Kochanek|p^9T]). To move beyond sim- 



ple two-image recon structions, |Gorenstein et al^ ( |1988b| ) and 
[Garrett et al.| ( |1994| ) used VLBI observationsto reso l ve the 
radio structure of the quasars; but |Grogin & Nara yan| ( |1996] ) 
and Barkana et al. (1999) found that the new radio constraints 
still could not strong ly constrain the lens p oten tial. Seeking 
yet more constraints, |Bemstein et al.| ( [l997 ) and [Keeton et al. 
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( |2000| ) used the Hubble Space Telescope to discover lensed 
images of the quasar host galaxy, but even they found that a 
range of models could reproduce all the strong lensing data 
(also see Bernstein & Fischer^ 1999). There were parallel ef- 
forts to obtain other kinds of data to pr ovide independent con- 
straints on the lens galaxy and cluster. [Tonry & Franx|(T997] ) 
measu red the central velocity dispersion of the lens galaxy. 



which Romanowsky & Kocha nek (1999) used in stellar dy- 
namical model s. [Fischer et al.|((1997| ) studied weak lensing 
by the cluster. |Chartas et al.|(|2002|) used the Chandra X-ray 
Observatory to detect the hot intracluster gas and estimate the 
cluster mass. Despite the growing amount of data for this 
system, a definitive description of the mass distribution has 
remained elusive. 

Uncertainties in the mass distribution propagate into at- 
tempts to use the lens time delay to determine the Hubble con- 
stant, Hq. In principle, lens time delays offer a simple mea- 
surement of the Hubble constant at cosmological distances 
that bypasses the distance ladder. In practice, howev er. Hp 
is degenerate with certain aspects of the mass model dFalco 



'et al.""1985 i Gorenstein et al."1988ai Williams & SahalpHnnf 
Kochanek 2002; Saha & WilHams 2006J). Due to these de- 
generacies, measurements of Hq in individual lens systems 
have yielded quite varied results. Figure [T] shows the lens sys- 
tems for which individual modeling of the mass distribution 
has been done to determine Hq. Examining the figure, it is un- 
clear what value of Hq such studies prefer. Roughly half of the 
previous studies are consistent with Hq < 60 km s~^ Mpc~^ , 
with most preferring Hq = 65-80 km s~^ Mpc~^ . Strikingly, 
only two studies, o f Q0957+561 ( Bernstein & Fischer|19W 



[Keeton et al.(2QQ0l ) and PKS 1830-211 ( [Lidman et al.|1999] ), 
found ^0 values > 85 km s~^ Mpc~^ . 

In non-lensing studies, measurements of Hq have reached 
better agre ement. Using Cepheid v ariable stars as distance 
indicators, [Freedman et al.[ ( [200 1[ ) obtained H n = 12 ±2 



Riess et al. 



(2009) re 



(stat.) ±7 (syst.) km s^Mpc" 
cently analyzed a higher quality and more homogeneous 
sample of Cepheids and supernovae and found Hq = 74.2± 
3.6 km s~^ Mpc~^ Results from measurements of the Cos- 
mic Microwave Background (CMB) have given similar re- 
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Fig. 1. — Current Hq measurements in individual lens studies. Black er- 
rorbars indicate the most recent measurement for a given lens. When more 
than one such effort has been made, we present the second most recent ef- 
fort in red. The measurements span a range of median values and uncertain- 
ties; a simple test indicates the scatter is not purely statistical at > 99% 
confidence. References — B02 18+357 : iWucknitz et al. (2004 ); York et al. 
f2005 1, RX J091 l-h0551: IHjorth et al.l(|2002 1, FBQ095 1-^2635: Jakobsson et 
|al. (20051, Q0957-h561: Bernstein & Fischer ( 1999 1 ; Keeton et al. (2000», 
PG 1115-^080: Treu & Koopmans (2002i; Chartas et al. (2004), SDSS 
1206-h4332: Paraticz et al. (2009i, SBS 1520-^530: Burud et al. ( 2002b », 
B 1600-^434: Koopmans et al. ( 2000i; Burud et al. ( 2000 1, B 1608-^656: Fass-] 
Inacht et al. ( 2002 1; Koopmans et al.|^2003rf, SDSS 1650+4251- V uissoz et al. 
™07i, PKS 1830-211: Lid man et al.| ( |1^99} ; |Wmn et al.|t200^j , HE 2149- 
2745: . Burud et al.,^2002a^ . 



suits. Most recently, Dunkley et al."(2008) analyzed the 5th 
year data release of the Wilkinson Microwave Anisotropy 
Probe (WMAP) to find Hq = 71.9!|^ km s'^ Mpc-\ assum- 
ing a universe with a flat geometry and a cosmological con- 
stant. With joint analyses of the various constraints, measure- 
ments of ^0 can now approach the percent level. For instance, 
[Komatsu et al.| ([2008) combined Hq measurements from the 
CMB, SNIa, and Baryon Acoustic Oscillations (BAO) to ob- 
tain Ho = 70.5 ± 1 .3 km s~^ Mpc~^ or a 2.3% determination of 
Ho. 

Given the apparent consensus in non-lensing studies of Hq, 
the scatter in results from lensing is puzzling. Are conven- 
tional lens models failing to account for variations in key 
components of lens models, such as the logarithmic density 
slope, angular structure, and/or substructure? Are the (often 
large) uncertainties in lensing measurements of Hq fully un- 
derstood? How can the uncertainties be reduced? 

One way to confront these concerns is to mod el a sta 
tistical ensemble of lenses when measuring Hq. 



Saha & 



Williams ( 2006 ) used non-parametric lens modeling to gen- 
erate free-form mass maps of 10 lenses and obtain Hq = 
km s~^ Mpc~^ . Adding one lens to this sam- 



72 5+^-^ 



pie, |Coles ( |2008| used similar techniques and found Hq = 



71;^g km s ^ Mpc \ Oguri (2007 ) took a parametric approach 



but attempted to incorporate a reasonable amount of complex 
ity and scatter in the models; his analysis of 16 time delay 
lenses yielded //q = 68 ± 6 (stat.) ±8 (syst.) km s"^ Mpc"^ 



While these results are enticing, the systematic uncertain- 
ties may still be underestimated and cannot be beaten down 
with sample size. In particular, lensing constraints on Hq are 
known to be affected by the "mass-sheet degeneracy": a uni- 
form sheet of mass can be added to a lensing potential in a 
way that leaves the positions and brightnesses of the images 
unchanged but rescales the inferred H ubble constant ( |Falco 



et al.||1983t [Gorenstein et al.|[l988a| . Groups and clusters 

the ^' - — 



surrounding lens galaxies or along the line of sight can act 
as mass she ets that bias lensing measurements of Hq ( Kee-| 
[ton & Zablud off 2004). While there is considerable effort to 
characterize the local and line-of-sight environments of strong 
lenses (e.g.,'Fassnacht & Lubin"2002UFassnacht et al."2006| 
2008; Momcheva et al. 2006; WilHams et al. 2006; Augeret] 
|al. 2007), and |Oguri| ( |2007 ) attempted to account for the mass 
sReet in several lenses for which it is expected to be signifi- 
cant, it is still not clear whether the systematic uncertainties 
are fully understood. 

In order to identify and control systematics, it seems that we 
still need to model individual lens systems in detail. One ad- 
vantage of this approach is the ability to use a dditional sources 
of informati on, such a s weak lensing (e.g., Fisch er et al.|1997t 
[Bernstein & Fischer! 1999) or stellar dyn amics (e.g.,|Grogin & 



Narayan 1996 ; Tonry & Franx 1997 ; Treu & Koopmans 2002[ 
jBarnabe & Koopmans^ 20 07), to reduce the uncertainties m- 
herent in strong lens ing. With the four- image len s B 1608 +656 



(Myers et al.| |1995| ), for example, ^Suyu et al.| ([2008j |2009| ) 
demonstrate that lensing and velocity dispersion information 
can be combined to get a ~ 7% determination of Hq, in which 
the modeling uncertainties are comparable t o the uncertain- 



ties in the measured time delays themselves ( Fassnacht et al. 
2002| ). With Q0957, we have a system whose complicated 
lens potential not only presents certain challenges for mea- 
suring Hq, but also bestows certain opportunities. In partic- 
ular, the presence of the cluster around the lens galaxies of- 
fers a rare chance to do a weak lensing analysis of an indi- 
vidual st rong lens system (as opposed to a stacked ensem- 
ble; e.g., JGavazzi et al.|[2Qn7l [Lagattuta et al.|[2009l ), which 
is key to breaking the miass sheet degeneracy and controlling 
that vital systematic uncertainty. Furthermore, there is now 
a rich variety of data available for Q0957, which enables a 
broad-based analysis. We therefore study Q0957 to inves- 
tigate statistical and systematic uncertainties in lens models 
and The strong lensing analysis presented here comple- 
me nts and draws upon th e weak lensing analysis presented 
by |Nakajima et al.| (2009); both are based on the same new 
HST /ACS data. In th is system extensive observational effort 
(e.g.,ISchild & Cholfi n 1986; Kundic et al.|1997||Oscoz"etaL 
200 ij i has yielded a ve ry precise time del ay with an uncer 



tamty of just 0.02% ( jColley et al.|2003| ). While we do not 
expect to achieve that level of precision in lens models, our 
goal is to see just how well we can do. We combine our new 
data with a number of independent constraints from previous 
observations as well as stellar population synthesis models, in 
order to obtain the most precise measurement of Hq to date 
for Q0957. 

In this paper we assume a flat universe with matter density 
= 0.274 and cosmological constant ft a = 0.726, which is 
consistent with the 5 -y ear WMAP+SN+B AO constraints from 
[Komatsu et ar| ( [2Q08l ). 

2. OBSERVATIONS AND DATA ANALYSIS 
2.1. Observations 




Fig. 2. — (a, left) A false color image of the central 30^^ of our combined F606W and F814W images of Q0957+561. (b, right) Close-up of the strong lensing 
region, after the main lensing galaxy and quasar images have been subtracted. The white cross indicates the center of the lens galaxy Gl, while the green crosses 
indicate the quasar positions A and B. The red boxes and yellow circles indicate the "blobs" and "knots" identified by Bernstein et al., (^1997j . Newly resolved 
faint features are seen south and east around quasar B and southwest of quasar A. The orange circle indicates an unknown object, not associated with any lensed 
features. Since the light profile of the object is consistent with the PSF, we surmise it is a faint halo star in the foreground of the lens. 



We conducted deep observations of Q0957+561 on 2005 
October 10-11 with HST's Advanced Camera for Surveys as 
part of program GO- 10569. Using four pointings of 7.7 ks 
in the F606W filter and 3.8 ks in the F814W filter, we cre- 
ated a 6' X 6' mosaic of the field. We arranged the pointings 
to overlap in the central 30^' region, forming a 30 ks image 
in the F606W filter (15 ks in F814W) for our strong lensing 
analysis with a final pixel scale of 0.03". The large number 
of exposures in this central region allows us to use a simple 
image-combining algorithm that avoids the undesirable PSF 
broadening and noise correlation of the common Drizzle al- 
gorithm ( [Fruchter & Hook|2002j ): 

1 . An astrometric solution is derived for each exposure by 

compounding the ACSAVFC coordinate map of Ander^ 

son| ( |2002| ) with an additional affine transformation to 



account for pointing errors, stellar aberration, and slight 
plate- scale variations due to the HST "breathing mode." 
The coefficients of the affine transformation are derived 
by registering objects detected in individual exposures. 

2. A grid of 0.03^' pixels is created for the combined im- 
age. Each pixel in each exposure is mapped to a single 
destination pixel. Input pixels flagged as invalid due to 
detector defects, etc., are discarded. 

3. For each destination pixel, we average all of the input 
pixels, using a sigma-clipping algorithm to eliminate 
pixels contaminated by cosmic rays. 

The procedure is identical to the use of Drizzle with a "drop 
zone" of zero size. Since each input pixel contributes to only 
one output pixel, the output pixels have uncorrected noise. 
The combining algorithm broadens the PSF only by an ef- 
fective convolution with the output pixel square. The final 
pixel size is chosen such that it is small enough not to degrade 
the resolution, but coarse enough that there are enough input 
pixels for averaging and outlier rejection. We present a false 



color image of our combined F606W and F814W images in 
Figure [2^. 

To look for new strong lensing constraints, we subtract the 
bright quasar images A and B using the PSF derived from ob- 
servations of the star HD237859. Since the PSF varies with 
focal plane position as well as time, we observed the star as 
close as possible in time and chip position to each quasar im- 
age in each of the four pointings. 

2.2. Lens Galaxy Properties 

We model the main lensing galaxy using the IRAF EL- 
LIPSE task, masking out regions where quasar subtraction 
takes place as well as any bright regions not associated with 
the galaxy (e.g., other lensed features). Our resulting IRAF 
model provides a measurement of the galaxy's isophotes and 
total fl ux (see Table [T]). As shown previously (Bernstein et al.J 
1997| ), the isophotes of the lens galaxy exhibit an ellipticity 
gradient and a position angle twist (Figure[3]). These isophotal 
features may complicate the lensing pot entia l, so we incorpo- 
rate them directly into our lens models ( |3.2| ). We also use the 
photometry of the lens galaxy to constrain stellar population 
synthesis models and estimate the stellar mass-to-light ratio 

(gg. 



2.3. Faint Strong Lensing Features 

We subtract the model galaxy from the quasar- subtracted 
image to produce the final image of the strong lensing region, 
which is shown in Figure [2]3. This image reveals several new, 
previously unresolved or undetected strongly lensed features. 
Since the morp hology is similar to the host galaxy arc from 
NICMOS (Keeton et al.l2000), we conjecture that the optical 
features are most likely images of star forming regions of the 
quasar host galaxy at z = 1 .41 . 

The lensed "blobs" a nd "knots" indicated in Figure[2]3 were 
pre viously identified by Bernstein et al.Y199 7|), and were used 
by [Bernstein & Fischer, ( 1999j and Keeton et al.| (2000) as 
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Fig. 3. — Ellipticity and position angle of galaxy isophotes, plotted as a function of semi-major axis. Note the increase in ellipticity beginning at ' 
decrease in position angle beginning at ~ 0.3^^ 



\" and the 



constraints on lens models. To derive new constraints from 



our new strongly lensed features, we use the models of Keeton 



et al.|(|2QQ0|) a s a starting point. Using the lensmodel software 
( |Keeton|2QQl| ), we check to see how these older models would 
map new features in the image plane. Specifically, we take an 
observed image position, map it to the source plane, and then 
find all corresponding images using the old lens models. We 
then look for the predic ted images in our new HST data. Un- 
fortunately, we find thel Keeton et al.| models cannot sensibly 
reproduce the lensing we see in the HST data. These models 
fail most notably for the new features south and east of quasar 
B, mapping bright peaks in the data to blank regions of the 
sky. 

In order to make sense of the new features, we examine 
the morphology of the images in the data. Specifically, in the 
area around the bright "knots" shown in Figure [2]3 we notice 
a distinct fork-like feature extending from either side of the 
knots. Using peaks in this structure, we postulate a set of new 
constraints as shown in the left panel of Figure |4] which we 
call our "primary" set of new constraints. We use these pri- 
mary features to constrain a singular isothermal ellipsoid lens 
model, and see how the resulting model maps other faint fea- 
tures found in the HST data. Specifically, we consider faint 
features to the east of quasar B, indicated by the large box 
in Figure |4^. To our surprise, the model derived from our 
primary features accurately describes the remaining faint fea- 
tures, mapping peaks of the images in reasonable ways (Fig- 
ure |4]3) and giving us confidence in the identification of new 
lensed features. We therefore add these features to our list of 
constraints, resulting in a final data set which is given in Table 
|2] We note that all presumed multiple images of each source 
are observed to have similar colors (within the photometric 
noise). 

To obtain the position and uncertainty of each peak listed 
in Table |2] we examine the brightest pixel(s) within the peak. 
If the brightest pixel is more than lOa noise above any other 
pixel in the peak (as for the "knots" or source IV in Table [2]), 
we set the position error to be ±1 pixel or ±0.03'^ If the 
brightest pixel is 3-lOa noise above the surrounding pixels, we 
conservatively set the error to ±1.5 pixels or ±0.05". If there 
are multiple pixels in the peak that are within 3 a noise of the 
brightest, we take the average position of all such pixels and 
set the error to be the distance from this average to the farthest 
of the bright pixels, plus our nominal value of 0.05 '^ 

3. LENS MODELING METHODS 



3.1. General Theory 

In this section we present a brief review of the lensing the 
ory that is particularly pertinent to our analysis of Q0 957. For 
further discussio n o f strong lensing the ory, please see |Schnei 
|der et al.| ( [T992l ) and |Kochanek| ( |2fe4l ). 

As predicted by Einstein's General Relativity in the early 
20th century, a mass concentration near the line of sight to 
a background object may significantly displace and distort a 
background image. The angular position u of the source and 
the angular position of x of an image are related by the lens 
equation. 



(1) 



where (j) is the (scaled) gravitational potential due to mass at 
the lens redshift. The lens potential is given by 



V^^(x) = 2k(x) = 2S/S, 



(2) 



where the convergence hi equals the surface mass density (S) 
scaled by the critical density for lensing (Hem). 

Since the deflected light travels along different ray paths, 
there is a difference in the light travel time for different im- 
ages. This difference, known as the time delay, can be mea- 
sured if the source is sufficiently variable. The time delay 
between images at positions Xi and xj is given by 



1+z/ DoiDq 
c Dis 



(3) 



X \ ^ {\xi-u\^-\xj-u\^)-[(l)(Xi)-(l)(xj)] 



where zi is the lens redshift and Z)^/, Dos, and Dis are angular- 
diameter distances from the observer to the lens, the observer 
to the source, and the lens to the source respectively. Com- 
bining a measured time delay with a lens model (to infer the 
source position u and the lens potential (j)) provides a mea- 
surement of the distance combination DqiDos/Dis, which is 
inversely proportional to //q- (The distance ratio also depends 
on cosmological parameters VLm and I^a, but that is typically 
a small effect compared with other uncertainties in the prob- 
lem.) 

A well-known problem in lensing constraints on Hq is the 
"mass-sheet degeneracy" ( Falco et al.|1985[rGorenstein et al. 
,1988aj . For any potential q) that fits the data, one can construct 



Fig. 4. — Our proposed mapping of the new lensed features (cf. Tablej2j. Corresponding shapes and colors indicate proposed images of a given source. The 
grid-like pattern that appears is due to the imperfect subtraction of the dinraction pattern of the quasar images, (a, left) We use the images indicated to constrain 
new lens models, and then use those models to test features in the yellow box. (b, right) We take one of the points in each shape/color pair, map it back to the 
source, and then find the corresponding images. The predicted positions match very well with features in the image, which gives us confidence both that the 
"primary" constraints are valid and that the additional lensed features are real. (Points in white rectangles indicate features that are too close to the noise level to 
provide confident peaks.) All of the colored points are used as constraints in our subsequent modeling. 



another potential 



-f^'\x\^ + (l-f^')(l) 



(4) 



that yields exactly the same image positions and flux ratios. 
The addition of the mass sheet k,' does, however, rescale the 
time delays and hence the inferred Hubble constant, such that 
= (1 - k')Ho. The challenge for Q0957 is that the cluster 
around the lens contributes a term (hZc in eq. [TT] below) that 
acts like a mass sheet, which we cannot constrain by fitting 
the positions and fluxes of the strongly lensed images. We 
therefore set k,c = when doing the strong lens modeling, so 
we obtain some Hubble constant estimate Ho^modei • We then 
use weak lensing data to constrain a^c, which yields our cor- 
rected Hubble constant estimate 



model • 



(5) 



Additionally, since the mass sheet correction is a rescaling of 
the potential, we must multiply each term in the potential by 
the same factor of 1 - as in eq. ([5|). For our results in ^ 
we indicate all parameters to which this applies. 

3.2. Mass Models 
3.2.1. Stellar component 

The lensing potential of Q0957+561 may be complicated 
by the ellipticity gradient and isophote twist seen in the lumi- 
nous component of the lens galaxy (see Figure [3]). To allow 
such features in lens models, Keetonetal. (2000) introduced 
"double pseudo-Jaffe" models featuring a superposition of 
two ellipsoidal mass distributions, centered on the galaxy po- 
sition, with different ellipticities, orientations, and scale radii. 
We initially adopted similar models, but quickly judged them 
to be unsatisfactory. Constrained by the new lensed features. 



double pseudo-Jaffe models were adopting odd forms, such as 
one round and one very flattened component, that seemed un- 
realistic and unlike what is observed for giant elliptical galax- 
ies. 

A better approach is to incorporate the observed elliptic- 
ity gradient and isophote twist directly by using our isopho- 
tal model of the stellar component. We combine the model 
galaxy with an assumed stellar mass-to-light ratio to construct 
a convergence map. (We vary the m ass-to-light ratio during 
the modeling, as discussed in §3.4 ) To compute the corre- 
sponding lensing potential, we solve the Poisson equation us- 
ing Fourier transforms. In Fourier space, the Poisson equation 
([2]) has the form 



-k^F(k) = 2K(k) 



(6) 



where K(k) is the Fourier transform of the convergence map 
and F(k) is the Fourier transform of the lens potential. It 
is straightforward to construct the convergence on a two- 
dimensional grid, calculate K(k) using FFTs, solve for F(k), 
and then do an inverse Fourier transform to get 0(x). We can 
then obtain the lensing deflection and magnification by com- 
puting derivatives of (j)(x) in Fourier space. This method is 
discussed in more detail by | van de Yen et al." ( 2009| ). 



3.2.2. Dark matter 

The luminous galaxy is presumably embedded in its own 
dark matter halo and any halo associated with the surround- 
ing cluster. The cluster in Q0957 is non-negligible: previ- 
ous weak lensing studies constrained the mean convergence 
within 30'^ of the le ns to be hi^o" = 0.26 ±0.16 and indicated 
a shea r of 7 ^ 0.2 ( [Fischer et al.||1997| [Bernstein & Fischer] 
1999] ). (In |3.5[ we report our own constraints on cluster mass 
models.) Also, X-ray observations of the intracluster gas in- 



6 



dicated a convergence from the clus ter at the quasar pos itions 



of K.A = 0.22!^;^^ and k.b = 0.21!^;^^ ( [Chartas et al.|2QQ2 |. 



Since the lens galaxy is the brightest and (presuniiably) most 
massive galaxy in the cluster, it seems natural to assume as a 
fiducial model that the galaxy lies at the center of the cluster. 
In this case the dark matter halo we insert in our models rep- 
resents some combination of dark matter associated with the 
galaxy and dark matter associated with the cluster as a whole. 
We consider various profiles to encompass the range of possi- 
bilities. One model we use is the Navarro-Frenk- White profile 
(NFW, .Navarro et al..l997j , 



P = 



Ps 



(r/r,)(l + r/r,)2 



(7) 



whose projected surface density has the form ( [Bartelmann 
|1996| ) 



l-Hx) 



1 



where x-rjrs, k^- psTs/T^crit^ and 



^^tan"Vx2-l (x>l) 
-TA=tanh"Vl-x2(x < 1) 
.1 (x=l) 



(8) 



(9) 



While these equations describe a spherically symmetric 
model, we can obtain an elliptical model by replacing r with 
the ellipse coordinate ^ = (x^ -\-y^ /q^)^^^ where q is the pro- 
jected axis ratio. The lensing potential and its derivatives can 
be computed for elliptical mo dels using numerical integrals 
dSchramml 19901 |Keeton|2QQT] ). 

We also use three softened power law profiles with pro- 
jected surface densities of the form 



K(0 = 



1 



(10) 



where ^ is the ellipse coordinate defined above, a is the core 
radius, bd is a normalization parameter, and the power law in- 
dex a is chosen such that MiR) ocR^ for R:^ a. In this class 
of models we study one with an isothermal profile (a = 1), 
one steeper than isothermal (a = 0.5), and one shallower 
than isothermal (a = 1.5). For a = 1 the lensing potential 
and its derivativ es can be computed analytically ( Keeton & 
Kochanek||1998f , but for a ^ 1 they require numerical inte- 
grals. 

We still need to consider the possibility that the cluster may 
not be centered on the lens galaxy, which is germane because 
the observed X-ray emission from the cluster is slightly off- 
set from the lens: the X-ray centroid is 4.3 ± 1.3 arcsec East 
and 3.2;^Qg arcsec North of Image B, or 4.7 ±1.2 arcsec from 
the lens galaxy ( jChartas et al. 2002). We carry out our full 
modeling analysis (as described in the remainder of the pa- 
per) treating the cluster as a distinct dark matter halo centered 
at the observed X-ray position. While this approach yields fits 
that are formally comparable to our fiducial results, we ulti- 
mately reject such models for two reasons. When we treat the 
offset cluster with a softened power law profile, the models 
require a large (> 30^0 core radius, which makes the "clus- 
ter" basically equivalent to a uniform mass sheet on the ~ 6" 
scale of strong lensing and thus negates the effect of having 
an extra halo. NFW models cannot mimic that behavior, of 
course, since they do not have a flat core. Instead, we find 



that an offset NFW cluster must have an unreasonably large 
ellipticity of > 0.7. We conclude that the strong lens data do 
not favor and in fact disfavor an offset cluster, so we not in- 
clude a separate cluster component in our main analysis. 

3.2.3. Environmental terms 

While our lens-centered mass profiles should account for 
the majority of the mass in the strong lensing region (apart 
from a mass sheet), they should not be expected to repre- 
sent the full complexity of the cluster potential. The cluster 
halo may not be a simple ellipsoid on large scales, and it pre- 
sumably has some lumpiness in the form of individual cluster 
galaxies. In general, we can write the lens potential due to 
structures outside the strong lensing region using a Taylor se- 
ries expansion of the form 



cos((9 - 6>^) + 7 cos 3(0 -Os)-^. 
4 6 



(11) 



where k,c is a mass sheet, 7 is an external shear, and a and S 
represent higher, third-order terms. If the structures are "far" 
enough from the Einstein radius, the higher-order terms may 
be sufficiently small that the expansion can be truncated after 
the shear; indeed this approximation is used in many lens sys- 
tems. However, as shown by Kochanek ( 1991 ) and many sub- 
sequent studies, the third-ord er terms cannot be neglected in 
Q0957. IBemstein & Fischer] d 1999 J and , Keeton et al. (20001 
included the third-order terms but imposed the assumptions 
a = -26/3 and 0^ = 0^, which corresponds to the perturbation 
from a singular isothermal sphere. We find that this assump- 
tion is too restrictive, so we include general third-order terms 
in order to allow more freedom in the models to account for 
complex features we have not explicitly modeled. 

Note that we have written eq. ( pTl ) in terms of the ampli- 
tudes (7, cr, J) and directions {6^ , Oa^s) of the shear and third- 
order terms. We can think of these as "polar" coordinate ver- 
sions of these parameters. We can equivalently define "Carte- 
sian" coordinate versions of the same parameters: 



7c = 7 cos 20^ , Js = 7 sin 20^ , 
ac = cr cos 0^ ^ crs = a sin Oa- , 
6c = S cos 36s 7 ^s = ^ sin 36s • 



(12) 



While we quote results for the "polar" parameters, we ac- 
tually carry out our modeling analysis using the "Cartesian" 
parameters. The translation is straightforward. 

In summary, our composite models consist of a stellar com- 
ponent, an elliptical halo centered on the stellar component 
that accounts for dark matter in the lens galaxy and/or cluster 
halo, and an additional set of terms corresponding to a third- 
order Taylor series expansion of the potential from the lens 
environment. The 1 1 associated model parameters are listed 
in Tabled 

3.3. Model Constraints 

Table[2]lists the positions of the lensed images used as con- 
straints on our models. We supplement the faint images dis- 
cussed in §23 with the quasar cores and radio j ets resolved 
with Very Long Baseline Inte rferometry (VLB I; |Gorensteili 
let al.|1988bt [Garrett et allT99 4 ). With a resolution of -0.1 
mas, the VLB I observations reveal structure within the jets 
and provide astrometry with formal errors < 1 mas. Previ- 
ous studies of Q0957 used this excellent astrometry as strong 
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co nstraints on lens models ([B arkana et al.|1999t|Bemstein et| 
|all[T997l [Bernstein & Fisch er 1999; Keeton et al.||2Q00| ). In 
this work, however, we reconsider the use of such stringent 
constraints. Many recent works have shown that lens galax- 
ies often contain substructure in the form of CDM subhalos, 
which can perturb lens flux ratios by tens of percent or more 
and image positions at levels up to ^ 10 mas (e.g., Mao & 



Schneiderl 19981 IMetcalf & M adaul200ll |Dalal & Kochanek 
2002t|Chiba et al.|2005| |Chen et al. 2007 ). Since our models 



do not contain any substructurenthey should not be expected 
to match the image positions to better than the ~ 10 mas resid- 
uals expected from substructure. We therefore broaden the 
errorbars and adopt conservative uncertainties on the VLBI 
quasar and jet positions of 30 mas. As a check, we verified 
that the VLBI derived positions of the quasars are in good 
agreement with the positions in our HST data. 

We take the image position constraint s tog ether with con- 
straints from our weak lensing analysis ( |3.5| ) to comprise our 
main model constraints. Subsequently, we consider adding 
supplementary constraints in the form of lensed arcs detected 
with HST/NICMOS, the quasar radio flux ratio, and stellar 
population synthesis models (see ^for details). Throughout 
the analysis we us e the time delay 417.09 ±0.07 days from 
[CoUey et aLl ( |2003| ) to infer the Hubble constant. 

3.4. Strong Lensing Analysis 

In Bayesian language, our ultimate goal is to determine the 
posterior probability distribution for our model parameters 
and Ho, given the observational constraints. We have N^ata = 
60 position constraints, compared with Nparam = 39 free pa- 
rameters (11 parameters for the mass model plus 28 source 
position parameters). Hence our analysis has Ndata -Nparam = 
21 degrees of freedom. The Hubble constant analysis involves 
one additional constraint (the observed time delay) and one 
additional parameter (Hq), so it has the same number of de- 
grees of freedom. 

Formally, the posterior probability distribution has the form 



P(0\d,M) = 



P(d\0,M)P(0\M) 
P(d\M) 



(13) 



where d denotes the data that provide constraints on the pa- 
rameters associated with some model M. The likelihood of 
the data given the model is 



C = P(d\0,M)(xe 



(14) 



where is the goodness of fit. The quantity P(6\M) repre- 
sents priors on the model parameters, which we take to be 
uniform]^ The Bayesian Evidence P(d\M) is discussed below. 

To ease the search of our large parameter space, we treat the 
im age positions u sing the approximate position as defined 
by |Keeton| ( [2Q0T] ): 

^SuJ - fij -SJ^ ■ fii-Sui, (15) 



4 = "^^^I -^i^ '^^i 



where the sum runs over all images, Sxi = Xobs.i - ^mod.i is 
the position residual for image /, Si = diag(cr?,cr?) is the co- 
variance matrix for the image positions, and /i/ is the lensing 

^ It would certainly be interesting to add substructure to the models and 
use the precise radio positions and flux ratios to constrain the amount of sub- 
structure in the lens; but that is beyond the scope of the present work. 

^ Note that our uniform priors apply to the "Cartesian" coordinate versions 
of the environmental parameters (eq.[l2) and the ellipticity (cc = e cos 20e and 
es = esm26e). 



magnification tensor. Using the lens equation, the position 
residual in the source plane is Sui = Xobs,i-^<l^(Xobs,i)-Umod- 
The last step in eq. JT5] ) is valid when the position residuals 
are small such that theimage and source plane residuals are 
related by Sxf ^ /i/ • Sui. The benefit of using this approach is 
that is quadratic in each source position Umod^ so we can 
find the best-fit value analytically: 



'^mod — ^ ^7 



where 



i 



(16) 

(17) 
(18) 



where these sums now run over the known images of a given 
source. The upshot is that we do not have to search explic- 
itly through the 28 dimensions corresponding to the source 
parameters. An additional advantage is that we only have to 
compute the lens potential and its derivatives at the known 
positions of the images, which is useful for our dark matter 
models that require numerical integrals. 

We must still search the 11 -dimensional space of mass 
model parameters. We sample the posterior probability dis- 
tribution in this space using an adaptive Metropolis-Hastings 
M onte Carlo Markov Chain (MCMC) algor ithm (Roberts et| 
[aLl [T997l IHaario et al.|[2QQTl [Roberts & Rosenthal. 20QTF 
Each chain consists of a sequence of trial steps drawn from 
a multivariate Gaussian distribution. In 95% of the steps the 
Gaussian is based on an empirically updated covariance ma- 
trix to provide efficient sampling of a high-dimensional pos- 
terior distribution. In the remaining 5% of the steps the co- 
variance matrix is diagonal so the algorithm takes indepen- 
dent and relatively large steps along the coordinate axes; this 
feature acts as a "fail-safe" to help the algorithm escape lo- 
cal minima in the x^ surface and potentially discover new 
features in the posterior distribution. The probability of ac- 
cepting a trial step that modifies the likelihood from Cou to 
Cnew is mm(l, Cnew/^oid)' In other words, if a trial step in- 
creases the likelihood it is automatically accepted; while if it 
decrease the likelihood it is accepted with a probability given 
by the likelihood ratio. 

We run 25 chains simultaneously and set them up to work 
"from the outside in." That is, we generate an initial sample 
of models by drawing ^10"^ random points in the parameter 
space and optimizing them; this provides an estimate of the 
allowed parameter ranges (although without the proper statis- 
tical sampling that MCMC provides). We then select models 
with maximal/minimal values of individual parameters to use 
as starting points for the MCMC chains. By starting with well 
spread chains, the MCMC algorithm can establish the covari- 
ance matrix more quickly, and spend more time sampling the 
tails of the distribution, than it would by starting with closely- 
spaced starting points ( Gelman et al. 11995). The choice of ini- 
tial models does not matter in detail, though, because for our 
final sampling we merge only the second halves of the chains 
in order to avoid sensitivity to the initial "burn-in" phase. 

To assess whether an MCMC run has conv erged, we use 
the criterion presented by |Gelman et al.| ( p"995| ). For any given 
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parameter 6 we define 



var(i9) 



Ej.ivar,-W 



1/2 



(19) 



where vary(^) is the variance of in the individual chain 7, 
and var(6>) is the variance of 6 over the entire set of / chains. 
Heuristically, 7? is a comparison of the variance of the entire 
distribution (the numerator) and the variance within individ- 
ual chains (the denominator). The ratio will be greater than 
1 for disjoint chains, and it will de crease and asymptotically 
approach 1 as the chains converge. |Gelman et al. (1995 ) find 
that stopping an MCMC run once R reaches values below 1.2 
provides a sufficient description of the target distribution for 
most samplings. To be conservative we run until R< 1.1 for 
every parameter. We then repeat the entire MCMC analysis 
a total of three times to obtain robust sampling of the target 
distribution. 

In general we let all 1 1 model parameters vary simultane- 
ously. The only exception is the scale radius in models with an 
NFW dark matter halo. There is a strong covariance between 
Vs and other parameters, which produces a narrow, curved 
ridge in the likelihood surface that is difficult for MCMC al- 
gorithms to sample efficiently. To deal with this challenge, we 
discretely sample in logarithmic steps ranging from 0.1^' to 
1000'^ We checked that the median values of parameters in 
the posterior distributions do not vary more than 5% between 
steps. To combine results from individual runs into the fi- 
nal posterior distribution, we need to normalize the individual 
results properly using eq. fT3] ). In particular, we need to de- 
termine the normalization lactor 

P{d\M) = J P(d\0,M) P(0\M) dO , (20) 

which is known as the marginal likelihood or Bayesian Ev- 
idence. Computing this integral usually requires techniques 
that are more computationally intensive than basi c MCMC 
sampling, such as t hermodynamic integration (e.g., Marshall' 
et al.|2QQ3t|Lartillot & Phillipe 2006). However, it is possible 
to obtain a simple and useful surrogate for the evidence using 
the Bayesian Information Criterion, 



BIC = -21n/:^«;,+y^lnA^, 



(21) 



where Cmax is the maximum likelihood of the model, k is the 
number of free parameters, and N is the number points used 
in the fit. While the BIC does not provide a proper statistical 
treatment of the evidence, it has been widely used in statistics 
and astronomy (e.g. ,|Schwarz| 197 8 ; Rapet ti et al.|2007j|Davis| 
let al.|2007tpdd lel2007D aiCTissufficient for this study. 

As discussed in ^ we first examine models constrained 
only by the positions of the strongly lensed images, and then 
add supplemental constraints from weak tensing and vari- 
ous other considerations. In practice, this means we run the 
MCMC analysis to sample the likelihood eq. ([14]) based on 
the goodness of fit from eq. ([l5|). (Since we use uniform 
priors, the posterior probability distribution is proportional to 
the likelihood.) Now suppose we want to add some supple- 
mental constraints characterized by their own goodness of fit 
X^. The total is just the sum Xp + X? (i-^-' values add 
and likelihoods multiply), so we now want to sample 



Since the MCMC analysis provides a set of points drawn from 

all we need to do is take those points and re- weight 

them by a factor proportional to This provides a simple 

way to apply additional constraints without having to repeat 
the full MCMC analysis. 

3.5. Weak Lensing Analysis 

In parallel with the strong lensing analysis, we have con- 
ducted a weak lensing analysis of the full 6' x 6' map to 
constrain the mass sheet and other environmental terms in 
the lensing potential (see eq. [TT]). The techniques and re- 
sults of our weak lensing analysis are presented by |Naka- 
jima et aL] p009|). We find the cluster potential to be con 



sistent with a core-softened isothermal sphere profile, hv(r) = 
K,o[l + (r/rc)^]~^^^, with a best-fit central convergence k,o = 
0.47 ±0.17 for a core radius = 5.0", corresponding to a 
velocity dispersion of (420 ± 70) km s~^ for h = 0.70. Addi- 
tionally, we find the cluster to be consistent with an NFW pro- 
file but are unable to provide useful constraints on the cluster 
concentration and scale radius. 

One product of our weak lensing analysis is the average 
convergence within 30^' of the lens: ^1:^,30" = 0.166 ±0.056. 
This represents the net convergence including contributions 
from both the lens galaxy and the cluster. As discussed in 
§3.11 to determine the Hubble constant we need to know the 
correction from the cluster mass sheet. 



1- 



l-K, 



w30" 



1-K 



(23) 



^,30" 



where /^vv,30'' is the net convergence inside 30'^ from our weak 
lensing measurement while his,30" is the contribution from our 
strong lensing model. Since we do not impose a priori lim- 
its on hZs,30" in our models, the possibility exists that hZs,30" 
could exceed hZw,30" • In this sense, our measurement of hZw,30" 
provides an upper bound on strong lens models. We penalize 
models with hZs,30" > ^w,30'' by adding an additional x^ term 
of the form 



, 0, /^^,30'' < /^w,30'' 



When we apply the mass sheet correction to Hq and mass 
model parameters, we need to account for the measurement 
uncertainties in hZw,30"- We do this using Monte Carlo tech- 
niques. Specifically, we take each model from our MCMC 
runs and generate a distribution of values for 1 - by draw- 
ing from a Gaussian distribution for k,w,30" set by the measure- 
ment and uncertainty from our weak lensing analysis. (The 
factor of hZs,30" in eq. l23| comes from the model itself.) We 



P(0\d,M)(xe-^'^^e-^l^^. 



(22) 



th en fold this distribution into our final results reported in 
§^4l}g4l 

The weak lensing analysis also yields constraints on the av- 
erage shear in an annulus centered on the lens galaxy extend- 
ing from 20^' to 40^': the two "Cartesian" shear components 
are 7^ = -0.009 ± 0.045 and 7, = 0.092 ± 0.045, or equiva- 
lently the two "polar" components are 7 = 0.093 ± 0.045 and 
= 47.8 ± 13.9 deg|^ To compare this measurement to our 
models, we calculate the mean shear in the same annulus. 
After multiplying our value of the mean shear by the 1-k,c 

^ Note that the uncertainties are likely to be non-Gaussian for the "polar" 
components. 
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correction, we impose the weak lensing results as constraints 
on the lens models through additional terms. These con- 
straints penalize models with unusually small or large shears. 

4. RESULTS 

We first present results from lens models based on our new 
HST/ACS data (|4l]). We then consider adding additional 
constraints from the quasar radio flux ratio ( |4.2| ), stellar pop- 
ulation synthesis models ( |4.3| ), and physical properties of 
NFW halos ( |4.4b . All values we report for the parameters 
(Tf606w, ^6/, 7,^T^), as well as the dimensionless Hubble con- 
stant h = Ho/(lOO km s~^ Mpc~^), are corrected for the mass 
sheet through t he fa ctor l-hZc (including the associated un- 
certainties; see §3.f 



4.1. Basic Results: Strong and Weak Lensing 

For each of the four dark matter profiles we consider, we 
find a wide range of models that fit the HST strong lensing 
data well (xjeduced ^ example, the stellar mass-to- 

light ratio can be anywhere in the range Tf606w ~ 4-12, the 
ellipticity e ^ 0.1-0.7, and shear 7 ^ 0.04-0.12. Table|5]Hsts 
the median value and 68% and 95% confidence intervals for 
each model parameter (from the individual marginalized pos- 
terior probability distributions). The Table also lists the rela- 
tive probabilities of our four dark matter models obtained by 
integrating the posterior probability distributions over h, after 
weighting by the BIC. The range of allowed models is striking 
given that we now have so many strong lensing constraints. 
Examining the relative probabilities, we find an isothermal 
(a= I) dark matter profile is favored from lensing alone. It 
is interesting to note, however, that our isothermal models re- 
quire a values of hZs,30" which are larger than, but still consis- 
tent with our weak lensing measurements. 

There are various ways to examine the results, so let us be- 
gin with the Hubble constant. Figures fSHS] show the marginal- 
ized joint probability distributions P(u^n) for each model pa- 
rameter and the dimensionless Hubble constant h, for all 
four classes of dark matter models. Viewing the results this 
way helps reveal any important degeneracies or systematics 
that affect the inferred value of h. The most obvious fea- 
ture is a strong degeneracy between h and the stellar mass- 
to-light ratio, Tf606w, which we discuss momentarily. Fo- 
cusing on h itself. Figure [9] shows the marginalized cumula- 
tive posterior probability distribution for h from each of our 
dark matter models. Combining the four models (weighted by 
their relative probabilities), we find Hq = ^5t\l km s~^ Mpc~^ 
(68% CL). Our measurement of Hq is somewhat higher than, 
but statistically consistent with, the recent determinations of 
Ho = 74.2 ± 3.6 km s"^ Mpc"^ from SNe ( |Riess et al.|[2009t 
and 7^0 = V0.5 ± 1.3 km s"^ Mpc'^ from WMAP5+BA0+SNe 
pComatsu et al. 20 08]). Compared with previou s results from 
Q0957 (Bernstein & Fischer|[T999l |I^eton et al. 2000), our 
initial results have lowered the mediai£|value from ~ 90 to 85 
and reduced the spread by ^ 30%. The latter result is signifi- 
cant given the new complexity in our models, the relaxation of 
(previously tight) quasar and jet positions, and the elimination 
of the quasar flux ratio constraint. 

' As shown in Tablejs] the best-fit model in each model class has a reduced 
somewhat less than unity. We used the probability distribution to check 

that these values are statistically reasonable given the numbers of degrees of 

freedom. 

^ The old median value for Hq is not very well determined, because the 
previous analyses did not include the proper statistical sampling provided by 
our MCMC analysis. 



The degeneracy between h and the stellar mass-to-light ra- 
tio Tf606w arises because the total mass within the Einstein 
radius is fixed, so varying Tf606w changes the balance be- 
tween the concentrated stellar component and the more dif- 
fuse dark matter halo. That, in turn, modifies the slope of the 
total density profil e, which is know n to be the main factor that 
determines h (e.g., [Williams & Sah a 2000; Kochanek 20^. 
To illustrate these effects, we examine the monopole deffec- 
tion curve, a(r) oc M{r)/r where M(r) is the projected mass 
within radius r (e.g.,IBlandfo rd & Narayan 1986 ; Blandford] 
& Kochanek|[T957l |Cohn et al . 2001; Kochanek et al. 2006; 



van de Yen et al.||2009). This is a 2D analog of the rotation 



curve. A flat deflection curve corresponds to an isothermal 
profile, while a rising (falling) curve corresponds to a profile 
shallower (steeper) than isothermal. Figure [TO] shows the de- 
flection curves for models with different values of h. There is 
a systematic change in the slope of the deflection curve with h, 
with very little (< 2%) scatter among models at a fixed value 
ofh. In other words, even though Q0957 has a complex lens 
potential, we still recover the familiar result that the slope of 
the total density profile is what principally determines h. In 
our models, the slope of the density profile is governed by the 
stellar mass-to-light ratio, which makes Tf606w the key pa- 
rameter responsible for the range of h values. We consider 
external constraints on Tf606w in §4. 3 [ below. 

One interesting aspect of Figure [Tq] is that Q0957 appears 
to have a rising deflection curve, corresponding to a total 
density profile that is shallower than isothermal, for any rea- 
sonable value of h. Many other lens galaxies h ave profiles 
that are closer to isothermal (e.g., Cohn et al.|2001; Rusin &| 
Kochanek|20Q5l|Koopmans et al.|2006|). Q0957"i rnot, how- 



ever, unique m this regard: [Kochanek et al. ( [2006| ) found that 
HE 0435-1223 also has a rising deflection curve. They ar- 
gue that different density profiles and deflection curves can 
arise as a consequence of how galaxies populate dark mat- 
ter halos. In the halo model (see jCooray & Sheth|2002| for a 
review), a group or cluster of galaxies consists or a massive 
central galaxy surrounded by smaller satellite galaxies. Lying 
as it does at the center of the potential well, the central galaxy 
should have a higher dark matter surface density compared 
to its satellite neighbors, which would lead to a more diffuse 
mass distribution with a shallo wer profile and hence a rising 
deflection curve. In this context, [Kochanek et al.|p006| ) argue 
that HE0435 may be the central galaxy in a group of galaxies. 
Q0957 seems to fit naturally into this picture because it lies at 
or near the center of a modest cluster of galaxies (see ^3.2| ). 

One useful way to characterize a strong lens system is with 
the lensing critical curves and caustics. The critical curves re- 
veal highly magnified regions in the image plane, and the cor- 
responding caustics separate regions in the source plane that 
lead to different numbers of lensed images. Figure [TT] shows 
examples of the critical curves for our models of Q0957. It is 
clear that the newly identified HST/ACS features have tightly 
constrained where the critical curve lies, especially south- 
east of quasar B. Significant variations do still exist near the 
ends of the critical curves, suggesting that constraints from 
the faint, unused features indicated in Figure]?] could help to 
further constrain the critical curves and restrict the parameter 
space. Unfortunately, in the current data these features have a 
signal to noise ratio less then 3, making them not sufficiently 
reliable. 

In addition to Tf606w, our models exhibit a second degen- 
eracy between h and the ellipticity of the dark matter halo. 
While the exact origin of this is unclear, it is likely connected 
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Fig. 5. — 2D histograms depicting the marginahzed joint probability distributions P(0,h) for each model parameter 6 and the dimensionless Hubble constant 
h. (For an explanation of parameters, see Table [3]) We also show the model flux ratio in the lower left panel. The colorscale is linear in the probability density, 
running from black at the peak to white at zero. These results are based on dark matter models with a softened isothermal profile (a = l). 
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Fig. 6. — Similar to Fig. [5] but for models in which the dark matter halo has a power law profile that is steeper than isothermal {a = 0.5). 



to the degeneracy in Tf606w- Since the stellar component has 
a fixed angular structure whose position angle (~ 67°-82°, 
Fig. [3]) does not quite match the necessary angular structure 
of the lens potential (with position angle closer to 63°, see 
Fig.[TT]), the dark matter component needs to make up the dif- 
ference. The amount of compensation increases as the mass 
of the stellar component increases, so the halo ellipticity rises 
with Tf606W. 



Figure [12] shows the corresponding caustics in the source 
plane. Generically, the tangential caustic is elongated along 
a roughly NE-SW direction and extends beyond the radial 



caustic. There is some variation among the models, but the 
main effect is just an overall rescaling by I- k,e, where ke 
is the convergence at the Einstein radius (which is related to 
K.s,30")' Many of our newly discovered sources are inferred 
to lie within the tangential caustic and should therefore have 
four images. Since we did not necessarily identify these as 
quad systems in our original detections ( |2.3| ), it is interesting 
to examine the predicted counter-images. Figure [13] shows 
all the predicted images of the quadruply-imaged sources, for 
comparison with the detected images shown in Figure]?] (We 
show results for one particular model, but results for other 
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Fig. 7. — Similar to Fig. [5] but for models in which the dark matter halos has a power law profile that is shallower than isothermal {a = 1.5). 
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Fig. 8. — Similar to Fig. [5] but for models with an NFW dark matter halo. Here we do not show P(rs,h) since we discretely sample the scale radius (see ^3A\ 



models are similar.) We see that there are some predicted 
images that lie in relatively blank regions West of quasar B 
and South of quasar A. This is not a concern, however, be- 
cause the undetected images have magnifications that are a 
factor of 10-100 smaller than the detected images lying to the 
East of quasar B, so their predicted fluxes lie well below the 
noise level of the HST image. Some of the predicted counter- 
images (among the ones indicated by diamonds) are not so 
far below the noise, but they still lack clear peak positions 
and thus cannot currently provide further robust constraints 
on lens models. 
We consider one additional source of strong lensing con- 



straints, namely extended images of t he quasar host galaxy 
observed with HST/NICMOS by Keeton et al.|(l2000|. Fol- 
lowing Keeton et al. , we analyze the arcs by taking the re- 
solved arc around quasar A, mapping it pixel-by -pixel to the 
source plane using the lens model of interest, then mapping 
the reconstructed source back to the image plane to predict 
the arc around quasar B. Generally, we find that all models 
generated from our ACS data reproduce the NICMOS arcs 
comparably well; the NICMOS arcs do not restrict the range 
of models significantly better than the ACS data. We infer that 
the ACS data have captured most of the information present in 
the NICMOS arcs, which is not surprising given that the ACS 
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Fig. 9. — Cumulative posterior probability distribution for the dimension- 
less Hubble constant h for our four dark matter halo profiles. The solid black 
curve shows the result of combining these distributions, weighted by their 
relative probabilities. 



a; 



1 


1 1 1 1 _ 






h=0.6 : 




h=0.7 - 




h=O.Q - 




: 




h=1.0 : 


1 


1 1 1 1 " 



2 4 6 

Radius (arcsec) 

Fig. 10. — Monopole deflection profile for models with h = 0.60, 0.70, 0.80, 
0.90 and 1 .00. We actually plot the mean profile for all models within ±0.005 
of the nominal h value; the scatter among such models is < 2% across all radii 
and does not depend systematically on the particular dark matter profile. The 
vertical dotted line indicates the Einstein radius at Rein = 2.M" . The scatter 
among models drops to < 0.5% in the vicinity of Rein, indicating a robust and 
tight constraint on the Einstein radius. 

and NICMOS data span similar spatial regions and (presum- 
ably) both come from the lensed quasar host galaxy. Com- 
pared with the NICMOS arcs, the ACS data are somewhat 
cleaner to interpret because they avoid complicated interpola- 
tions to and from the image plane and offer a more straightfor- 
ward counting of degrees of freedom. We take the compati- 
bility of the NICMOS and ACS data as ad dition al reassurance 
that the mapping of faint lensed features ( |2.3| ) has been done 
correctly. 

4.2. Quasar Flux Ratio 

So far we have only considered image positions as lens- 
ing constraints. In order to include some information about 
the tensing magnification, we consider measurem ents of the 
quasar flux ratio. ^Bernstein & Fischer| ( p^99 j and |Keeton et 



al. ( 2QQQ| ) used the d elay corrected V LA measurements at 4 
cm and 6 cm by Haa rsma et al.| ( |1999] ) to constrain the quasar 
core flux ratio. While the VLA cannot resolve out the relative 
contributions of radio compontents, if the jets are invariant on 
decadal time scales, the ratio of the radio fluctuations gives 
a measurement of the core flux ratio. With this assumption, 
IHaarsma et al.| derive the core flux ratio to be 0.74 zb 0.02. 
Before applying this constraint to our lens models, we must 
consider whether our models should be expected to fit the ob- 
served flux ratio. The issue is whet her dark ma tter substruc- 
ture may perturb the flux ratio (e.g.,IMao & Schneider 1998^ 
Metcalf & Madau 2001 ; Dalai & Kochanek 2002 ; Chiba e taL] 
2005^ MacLeod et al.,2009j in a way that our smooth models 



cannot reproduce. 

High-resolution VLBI measurements show that the quasar 
images are ^1.2 mas in size ( Gorenstein et al. 1988b ), which 
corresponds to a size for the emission region in the source 
plane of ^0.9 mas or < 5.4 pc. We use the methods of 
[Dobler & Keeton| ( [2006| ) to estimate how a source of this size 
would be affected by an isothermal sphere clump placed near 
one of the images. We find that a clump of mass > 10^ 
can easily change the lensing magnification by a factor of or- 
der unity, and A^-body simulations predict such subhalos to 
be ab undant (> 10^) in a galaxy with a mass o f ~ 10^^ 
(e.g., Springel e t al.|[20Q5| [Angulo et al.||2008| ). Apparently 
we should not discount the possibility that substructure plays 
a significant role in the observed VLA flux ratio. 

Our basic models generally predict a flux ratio in the vicin- 
ity of ~ 0.5 with at most a tail extended to the range of the 
VLA measurement (see Figs. |5H8] and Table [5]). The discrep- 
ancy could be interpreted as eviaence that the VLA flux ra- 
tio is indeed perturbed by substructure. Further support for 
this hypothesis comes from the fact that the magnification 
ratio inferred from the resolved radio jets is different from 
the ratio fo r the quasar cores, and closer to the smooth model 
prediction (Bonometti| 1985] [Gorenstein et al.| 1988b [[Conner 



[et al.[p^92 i. We should be careful, of course, not to think 
that a measurement that disagrees with our smooth models is 
"wrong" and one that agrees is "right" — or to assume that 
any discrepancy involving a flux ratio automatically implies 
dark matter substructure. Nevertheless, we conclude that ex- 
isting evidence shows the flux ratio to be very intriguing and 
worthy of further study, both on its own and as possible evi- 
dence for substructure in Q0957. 

With this in mind, it is not clear how strongly we should 
impose the VLA flux ratio as a constraint on our models. We 
consider using the measurement but inflating the errorbar by 
various factors to obtain a range of constraints from strong to 
weak. Figure [14 shows the marginalized cumulative poste- 
rior probabilitymstribution for h for the different cases. The 
flux ratio constraint tends to reduce the median value of h and 
tighten the distribution. While such a result seems enticing, 
we caution that it may be artificial if the flux ratio is really 
perturbed by substructure which is absent from our model. 
Given the concerns, we choose not to use the flux ratio as 
a constraint for our final results. If there were some way to 
determine the "macro" flux ratio, however, that might help 
improve constraints on h. 

4.3. Stellar Mass to Light Ratio 

As shown in Figures [5}|8l our models demonstrate a strong 
correlation between h and i f606w, the stellar mass-to-light ra- 
tio, as a consequence of the radial profile degeneracy in lens- 
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Fig. 1 1 . — (a, left) A typical critical curve resulting from the new image constraints in Table 2. Notice the fold image pairs that the curve runs though South 
and East of quasar B. (b, right) Critical curves corresponding to models with minimal/maximal values of different parameters, for our models with an NFW dark 
matter halo (results are similar for other model classes). The critical curves show little variation along the semi-minor axis due to strong constraints from new 
images. Significant variation still exists along the semi-major axis of the curves. 



ing. Any external constraints on Tf606w could not only nar- 
row the overall parameter space but also tighten constraints 
onh.lt is possible to obtain such constraints from stellar pop- 
ulation synthesis (SPS) models thanks to the fact that we have 
excellent HST photometry of a relatively "simple," old stellar 
population in the lens galaxy. 
We use two sets of SPS models. The first are the Flexi- 



allow star formation to begin anywhere from the CMB red- 
shift, z = 1089, to the redshift of the lens, z = 0.361. We adopt 
the star formation rate 



^(0 = 



1-C 



-t/r 



ble Stella r Population Synthesis (FSPS) models of Conroyet 
aL] ( |2009| ). These models exhibit enormous flexibility and are 



c 

T(Zl)-T(Zform)' 



-T(zi)/r 



(25) 



T(Zform) < t < T(zi), 



aimed at addressing many of the uncertainties of SPS models. 
In particular, the user can not only consider the traditional 
effects of varying the star formation history, star formation 
epoch, metallicity, initial mass function (IMF), and dust, but 
also account for various treatments of the thermally pulsating 
asymptotic giant branch (TP -AGB), blue st ragglers, and blue 
horizontal branch stars. See |Conroy et al.| ( |2009j ) for details 
of the FSPS models and the SPS uncertainties they address. 

•IT 



Using the FSPS models, we follow the treatment of Conroy et 

t. , adjusting 9 models parameters (see Table |4]) that include 
e effects of varying the epoch of star formation, the star for- 
mation history, and dust. We consider six metallicities from 
50% to 160% Solar using a Chabrier IMF (Chabrier 2003]). 

The second set of SPS models we use are from Maraston et 
|ari(|2009 ). These models are based on the same simple stel- 
lar populations (SSPs) as Maraston (2005 ), which treat the 
TP-AGB contribution to the SEDs, but also include a metal 
poor ([Z/H]= -2.2) population comp rising 3% of the mass. 
As shown by Marasto n et al.| ( [2009 ), these models provide 
a good fit to the optical colors of galaxies in the Luminous 
Red Galaxy (LRG) sample from the Sloan Digital Sky Sur- 
vey. This is encouraging because SPS models have histor- 



ically had trouble fitting LRG colors (e.g., |Eisenstein et al. 
200T| [Wake et al.| 2006). Since the lens galaxy in Q0957 
is luminous (L ^ 6.5L^) and red (mFeo6w - f^F^uw = 1.057), 
we postulate that these models should provide a good fit the 
galaxy's spectral energy distributio n. To account for variation 
of the lens galaxy from the SSPs of [Maraston et al.[ we allow 
for variation in the redshift at which star formation begins, the 
star formation history, and dust. As in our FSPS models, we 



where C is the fraction of stars formed at a constant rate, r is 
the e-folding time of the star formation rate, and T(z) denotes 
the age of the universe at redshift z. This form of the star 
formation rate has the advantage of smoothly varying from a 
SSP to a constant star format ion history. For dust, we consider 
the two-parameter model of |Charlot & Fall ( 2000) which in- 
cludes the effect of dust around young stars as well a diffuse 
dust component. Parameters for our Maraston SPS models 
are summarized in Table lU 

As constraints on the SPS models, we use our measure- 
ments of the F606W and F814W magnitudes of the lens 
galaxy (see Table [T]) together wit h a reanalysis of the NIC- 
MO S/F 160 W image obtained by ^^tra^lLl ((20001). The 
revised AB magnitude mFi6ow,AB = 16.84±0.15 (Chien Peng, 
private communication) differs somewhat from the originally 
reported value because it is based on a more sophistocated 
deconvolution of the lens galaxy from the quasar images and 
host arcs. We correct for Galactic extinction using the meth- 
ods of |Schlegel et al.| ( |1998| , finding the value of ^(^-V) = 
0.0095^ 

Initial modeling found that large values of dust extinction, 
extending to Af606w > 2.0 mag, were allowed in the SPS mod- 
els. This was unexpected since early type ga laxies are known 
to have modest dust content (e.g., Schawinski et al.|[2007| ). 
In lensed systems, differential extinction measurements have 
shown that lenses typically exhibit smaller values of dust ex- 
tinctio n than found in nearby, late-type galaxies: [Eliasdottir 
[eFaL]{2006 ) find a mean exti nction of Ay = 0.5 6 =b 0.04 in a 
sample of 10 lenses (also see |Falco et al.||1999| ). For Q0957 
we could in principle rely on previous attempts to measure 
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Fig. 13. — Predicted images of all sources that lie inside the tangential caustic for our lens models. Results are shown for the same model as Figure [T2^ but 
are similar for other models. The two panels correspond to different sets of sources, with the same arrangement as Figure [4] There are some predicted images 
shown here that have not been detected (i.e., they do not appear in Fig.|4|; they are predicted to be below the noise in our H§T data. 
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Fig. 14. — Cumulative posterior probability distribution for the dimension- 
less Hubble constant h with different assumptions about the constraint from 
the quasar radio flux ratio. The solid black line shows our fiducial results 
using the image positions alone (from Fig. [9}. The dashed lines show how 
the results change when we impose constraints from the VLA flux ratio con- 
straints ( Haarsma et al. 1999 1 with the uncertainties increased by a factor of 
3.5, 7.5, and 15, representing different levels at which effects from substruc- 
ture might be understood. These values are consistent with a increase in the 
magnification of image B by a factor of 1.35, 1.20, and 1.0 due to a subhalo 
of mass > 1O^M0. While adding flux constraints clearly tightens the h dis- 
tribution, we do not include them in our final h distribution since we do not 
currently understand the extent to which substructure may be important in 
Q0957. 

the d ust content of the lens ga laxy, but the results are puz- 
zling. [Goicoechea et al.| ( |2005| ) used HST/STIS observations 
to measure the flux ratio Fb/Fa > 1 at optical and u ltravio- 
let wavele ngths (also see the delay corrected ratios of |Colley| 
|et al.| l2Q03 ), which stands in stark contrast to the VLA mea- 
surement Fb/Fa = 0.7 4 ± 0.02 and our m odels predictions. To 
explain this difference [Goicoechea et al.| invoke dust clouds in 
front of image A leading to extinction Ay = 0.30. It is counter- 
intuitive to think that image A (at 18.6 kpc from the center 
of the galaxy) is more heavily extinguished than image B (just 
3.7 kpc from the center). If extinction is indeed the cause 
of the wavelength dependence in the flux ratios, it remains 
unclear how the dust is distributed throughout the rest of the 
galaxy, whether it is clumpy and extends to large radii in other 
directions. For all these reasons, we choose not to constrain 
the dust in the SPS models to a particular value. Neverthe- 
less, in order to avoid unreasonable large extinction values we 
impose a weak, exponential prior of the form ^-^^^eoew/i omag 

In order to derive constraints on Tf606w, we set up an 
MCM C analysis similar to what we use for lens models (see 
§3.4| ) to sample the posterior probability distribution. Since 
the value of Tf606w depends on h (principally through the age 
of the universe as a function of redshift), we run the MCMC 
analysis for discrete values of h from 0.50 to 1.45 in steps of 
Ah = 0.025; the small steps ensure that the median and range 
of Tf606w do not vary by more than 3% from one h value to 
the next, so we can interpolate accurately. Figure [15] shows 
the cumulative distributions for Tf606w for different values of 
h, from both FSPS and Maraston models. In general, both 
models fit the observed F606W-F814W and F606W-F160W 
colors well (x^duced ^1)' t>ut the FSPS models yield a larger 
range for Tf606w- This is not surprising given the amount of 
freedom available in the FSPS models. Previous studies of 



massive ellipiticals found values of Tg of ~ 4 - 10 (e.g., Ger 
hard et al.||2QQri|Grillo et al.||2009| ), in good agreement witli 
the values found hereP] 



Figure [16] shows what happens to the posterior probability 
distribution for the Hubble constant when we impose the SPS 
constraints on Tf606w- Using the FSPS models has little ef- 
fect on the h distribution, because these models allow for a 
large range of values for Tf606w- The Maraston models, by 
contrast, favor Tf606w ^ 4.5-6.5 and such values tend to re- 
duce the median h and tighten the distribution. Clearly it is 
important to understand why the FSPS and Maraston mod- 
els differ as to whether high values of Tf606w are accept- 
able. Figure [17] shows that the high Tf606w values attained 
in FSPS models correspond to large values of extinction — 
values that seem surprising for a luminous early-type galaxy 
in a modest cluster at redshift z = 0.361. We infer that the 
flexibility of FSPS models is allowing them to reproduce the 
observed colors of the galaxy even with models that do not 
make much sense astrophysically. One way to reconcile the 
FSPS and Maraston models is to constrain the amount of dust 
in our FSPS models. We find that adopting an extinction prior 
of Af^o^w = 0.45 ± 0.2 would bring the FSPS constraints on 
^F606w into agreement with those from the Maraston mod- 
els. Imposing such a prior has little effect on the Tf606w con- 
straints from the Maraston models since those models show 
little or no correlation between dust extinction and Tf606W. 

Ultimately we need to decide what to use for our final 
constraints on Tf606W. Since the Maraston models are con- 
structed to match the SDSS LRG sample and require no ad 
hoc assumptions about the dust content of the lens galaxy, we 
elect to use them when reporting our final determination of h 
and model parameters (see Table [6]). With these constraints 
on Tf606w we find Hq = 79.3!|^ km s'^ Mpc"^ (68% CL). 

4.4. Physical Properties ofNFW Halos 

When fitting models with an NFW dark matter halo, we pre- 
viously took both rs and Ks = Psrs/^crit to be free parameters. 
However, in A/^-body simulations the two NFW parameters are 
actually related to one another, albeit with some scatter. We 
now consider whether our lens model parameters have reason- 
able values in general, and whether they are consistent with 
the correlation found in simulated halos. 

Parameterizing NFW halo s with the virial mass My/r and 
concentration Cv, |Maccio et al. ( 2007 ) find that the parameters 
are related by c,(z) = 213!40m;"/"^=^^-^^^ (1 +z)"^ We can 
express our lens model parameters in terms of Mytr and as 
follows: 



1+z 



2/3 



(26) 



F>olF>ls 2 



Do 

1 



ln(l+Cv) 



2GMy, 



1/3 



(27) 



where Ay/^ = 98 is the virial overdensity ( [Mainini et al. 2003 
Maccio et al.|2007| ). Figure[T8]compares our recov ered moHe 



val ues of and k.s with expectations based on the |Maccio et 
[aL| ( |2007| ) relation for different values of the NFW halo mass. 
The first point is that the model parameters do indeed have 

^ At a redshift of z = 0.36, observed Tf606W corresponds to rest frame T^. 
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Fig. 15. — Cumulative posterior probability distributions for the stellar mass to light ratio, Tf606W- The differe nt curves corresp ond to different values of h, 
varying from 0.55 (black) to 1.10 (Hght orange) in increments of 0.05. (a, left) Results from the FSPS models of Co nroy et'ar| j2009). (b, right) Results from the 
SPS models of Maraston et al. ( 2009). The FSPS models tend to produce higher values of Tf606W with more scatter than the Maraston models. Also, the FSPS 
models are not as systematically dependent on h, presumably because the large freedom in the models dominates the distribution of Tf606W- 



reasonable values. Going into more detail, we see that our 
models with small scale radii (r^ < lO'O are consistent with 
relatively low cluster masses (My^y ^ 10^^-^ M©) but concen- 
trations that are 3-6a above the median for that mass. As the 
scale radius increases, our models follow a track correspond- 
ing to increasing mass and decreasing concentration; indeed, 
lens models with > 200^' require an extraodinarily large 
mass of M^ir > 10^^ M©. Such a large mass seems unreason- 
able for a fairly modest galaxy cluster, especially considering 
that X-ray measur ements imply a mas s within 1 h^l Mpc of 
9.9!^| X lO^^M© (Chartas et al.|2002 V Therefore, we argue 
that our models with ^ 2U0^^ are disfavored but models with 
rs < 20C have parameters that seem reasonable in compari- 
son with simulated NFW halos. We note that recovering a 
concentration that is a few sigma above the median is not nec- 
essarily a concern, because there may be a selection bias such 
that a high concentration increases the lensing cross section 
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Fig. 16. — Cumulative posterior probability distribution for h with and 
without constraints from SPS models. Since the Maraston SPS models have 
been shown to fit LRG colors from the SDSS, we adopt them when quoting 
final values of h and model parameters. We find Hq = km s~^ Mpc~^ 

) CL). 



(see, e.g., |Mandelbaum et al.|2009] ). 



5. DISCUSSION AND CONCLUSIONS 

Since its discovery in 1979, Q0957 has presented many 
puzzles that we still cannot definitively solve. Nevertheless, 
by combining new HST/ACS data and stellar population syn- 
thesis models, we have presented a consistent picture of lens- 
ing in this system using a realistic treatment of both the st ellar 
and dark matter components of the mass distribution. In |5.1| 
we discuss our results regarding measurements of Hq with 
Q0957. Turning the tables, in |5.2| we adopt priors on Hq 
from other measurements and examine the inferred proper- 
ties of the lens mass distribution. Looking ahead, in |5.3| we 
discuss potential ways in which these measurements can be 
improved. 
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Fig. 17. — 2D histogram showing the joint probability distribution for 
the stellar mass-to-light ratio, Tf606W. and the amount of extinction in the 
F606W band, from FSPS models. As the amount of extinction increases, the 
model magnitude in the F606W filter increases, leading to a larger values of 
Tt^Feoew- Results are shown for models with h = 0.7 and Solar metallicity, 
but the distributions for other values of h and metallicity exhibit a similar 
behavior. 
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5.1. Hubble Constant 

Motivated by our new ACS data, we conducted a joint 
strong+weak lensing analysis in the l iope of obtaining the best 



constraints to date from Q0957. In ^4.1| we found //q = 85;^|3 
km s~^ Mpc~^ (68% CL) on the basis of lensing alone. This 
result is higher than, but still consistent with, measurements 
from other recent lensing (e.g., Jakob sson et al. 2005 ; Paraficz 
et al. 2009 ; Oguri 2007 ) and non-lensi ng (e.g., Freedman et all 
2001 . ,Riess et al...2009t iDunkley et al.||2008^ |Kom atsu et ay 
2008| ) studies . In spite of the extensive lensing data we have 



obtained, the uncertainty in Hq from Q0957 is still larger than 
from most other lenses (see Fig.fl]). 

One source of uncertainty in Q0957 is the sheer complex- 
ity of the potential: with ellipticity, shear, and higher-order 
environmental terms to play with, models can find a wide 
range of combinations that fit the data well (see Figs. [5}|8]). 
The main systematic effect in lens models is a correlation be- 
tween h and the stellar mass-to-light ratio of the lens galaxy. 
Varying Tf606w changes the balance between stars and dark 
matter in the lens galaxy, which modifies the net density pro- 
file, which then affects h through the radial profile degeneracy 
(e.g., | Kochanek||2002| ). We can actually turn this degener- 
acy to our advantage if we can place indep endent constraints 
on the stellar mass-to-light ratio. In |4.3|we used the stellar 
population synthesis models from Maraston et al.| ( |2009| to 
constrain Tf606w and thereby reduce the uncertainties m our 
Hubble constant determination to = 79.3;^g ^ km s~^ Mpc~^ 
(68% CL). 

While this is a significant reduction in the uncertainty for 
Hq from Q0957, we caution that SPS models are still improv- 
ing and may ultimately be even more complicated tha n the 
Maraston mode ls . When we used the FSPS models of iCon-l 
roy et al.| ( |2009| ), for example, we did not see much tightening 



of the Hq constraints relative to lensing alone; and we traced 
the trouble to uncertainties in the amount of dust extinction in 
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Fig. 18. — The space of scale radius, r^, and halo normalization, Ks, for 
NFW halos at the lens redshift z = 0.361. The shaded region indicates the 
parameter range occupied by our lens models with an NFW dark matter halo. 
The dotted curve shows the expected relation between Ks and based on the 
median concentration/mass relation found in A/^-body simulations by Maccio 
|et al. ( 2007 1, while the dot-dash curves show the ±3cr range, and the dashed 
curve in the upper left corner lies 6cr above the median. The colored solid 
curves represent the theoretical predictions at fixed virial mass, with the con- 
centration varying ±3cr around the median, ranging from 10^2^0 (black) to 
10^^ M0 (orange) in steps of 0.5 dex. 



the lens galaxy. There is one additional source of uncertainty 
in SPS models that we did not explicitly address, namely the 
IMF. Variations in the IMF can alter the co lors and mass-to- 
light ratios of SPS models (e.g., |Conroy et al. 2009 ). In par- 
ticular, a more bottom-heavy IMF (e.g., Salpeter) would raise 
the inferred value of the Tf606w and hence our median value 
of Hq, while a more top-heavy IMF would have the oppo- 
site effect. For the Maraston SPS models, variations in the 
IMF must be relatively small or the models would cease to 
provide a good fit to the SDSS LRG sample. We attempt 
to compensate for IMF-related variations by allowing broad 
ranges for the star formation history, star formation epoch, 
and dust. Nevertheless, this remains an unknown, but pre- 
sumably small, uncertainty in our models. Clearly there is a 
lot of room for improvement with a better understanding of 
the stellar population of the lens galaxy (see |5.3|). 



5.2. Mass Distribution 

Instead of trying to measure Hq ourselves, we can choose to 
place external priors on Hq to see how well we can understand 
the mass distribution of the lens. We consider two determina- 
tions of Hq: the refurbished distance ladder measurement of 
Ho = 74.2 ± 3.6 km s'^ Mpc'^ by Ries s et al.| ( |2009| ), and the 
combined WMAP5+SNe+BA0 value of^^703± 1.3 from 



[Komatsu et al. ( 2008 ). In Table [7] we show the model param- 
eters recovered from this approach. 

Examining the relative probabilities of the models, it is 
clear that the lensing data favor a power law profile of index 
a = 0.5 or 1.0 over a = 1.5 or an NFW profile. Both of the 
favored models exhibit relatively large core radii: a = 5.St\'o 
arcsec for a = 0.5, or a = 4.2;^q7 arcsec for a = 1. Given the 
reduced probability of our NFW models, we conclude that 
lensing provides strong evidence for a constant-density core 
(rather than a cusp) in the dark matter halo of Q0957. 

In Figure[TO]we showed that Q0957 exhibits a rising deflec- 
tion profile, analogous to a rising rotation curve and indicative 
of a net density profile shallower than isothermal. While this 
is not the first case of a lens with a rising rotation curve (see 
Kochanek et al.||2006| ), the origin of phenomenon is unclear. 



One possible explanation involves the special location of the 
lens galaxy. As the central galaxy in a modest cluster, the lens 
is embedded in the most concentrated part of a massive dark 
matter halo. The higher than average surface density of dark 
matter leads to a shallow density profile and a rising deflec- 
tion curve. To further explore this idea, we shown in Figure 19 
the fraction of the deflection contributed by dark matter as a 
function of radius. This is equivalent to the projected enclosed 
dark matter fraction as a function of radius. At the effective 
radius, dark matter constitutes (50 ± 7)% or (57 ± 7)% of the 
enclosed mass (assuming the Riess or Komatsu priors on 
respectively). Such values are well above the minimum dark 
matter fraction found (38 ± 7)% for galaxies in the SLACS 
survey ( Bolton et al. 2008 ), and are greater than 16 of the 22 
lens systems considered by ( [Jiang & Koch anek 2007 ). It is 
not very surprising, of course, to find a relatively large dark 
matter fraction in a brightest cluster galaxy. 

We find the the dark matter halo in Q0957 must be well 
aligned with stellar mass distribution. With either the Riess 
or Komatsu Hq priors, we find the position angle of the dark 
matter halo to be Oe = 73;^^q deg, in good agreement with the 
measured position angle of the stellar component 0^ = 71 ±5 
deg at large radii (> 5^0- Perhaps more interesting is that the 
ellipticity of the dark matter appears to be in good agreement 
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Fig. 19. — Fraction of the monopole deflection contributed by the dark 
matter halo, as a function of radius. Since a(r) oc M{r)/r, this plots also 
depicts the enclosed projected dark matter fraction as a function of radius. 
The vertical dotted line indicates the Einstein radius at Rein = 2.M" while 
the vertical dashed line marks the effective radius of the stellar light profile 
at Re = 2.21" . The shaded regions show results from our models when we 
adopt different priors on the Hubble constant: the orange region corresponds 
to assuming Hq = 74.2 ± 3.6 km s~^ Mpc~^ from Riess et al. ( 2009 1, while 
the region indicated w ith red horizontal line s corresponds to //q = 70.5 ± 
1.3 km s-i Mpc-i from |Komatsu et al.| ( |2008t . 



with that of the stellar distribution. We find the ellipticity of 
the dark matter to be ^ = ^-^^-oxn ^^^ = ^-^^-om the Riess 
or Komatsu Hq values, in good agreement with the measured 
value of the stellar ellipticity ~ 0.3 at large radii. 

In general, we find that improving Hq constraints from 5% 
to 2% has little impact on our understanding of the mass dis- 
tribution, because most of our model parameters have little or 
no correlation with ^o- The only significant exception is in 
our determination of the stellar mass to light ratio, Tf606W. 
We find Tf606w = 5.5;^q5 using the Riess value for Hq, versus 
Tf606w = 5.5;^Q 3 using the Komatsu value. For comparison, 
the Maraston SPS models give Tf606w = 5.9 ± 1.9. It is inter- 
esting to see that combining lensing with Hq priors can pro- 
vide excellent constraints on stellar populations, which may 
prove useful as SPS laboratories as multi- wavelength datasets 
for well- studied lenses grow. 

We found the intriguing result that our models favor a 
quasar flux ratio around Fb/Fa = 0.53 ±0.06, which is sub- 
stantially different from the radio measurement of Fb/Fa = 
0.74 ± 0.02. Since the radio emission is free from extinction 
by dust, and should be insensitive to microlensing by stars, 
we infer the quasar flux ratio in Q0957 seems to be "anoma- 
lous." The putative anomaly presumably indicates additional 
complexity in the lens potential. While the nature of that com- 
plexity is not yet clear, it is worth noting that dark mat ter sub- 
structure can easily produce flux ratio anomalies (e.g.,|Mao &| 
Schneider 1998; Metcalf & Madau 2001 ; Dalai & Kochanek 
2002nChiba et al.|2005HMacLeod et al.,2009| ). Invoking sub- 
structure as an explanation seems especially enticing because 
the radio jets, a mere 80 mas aw ay from t he quasar images, 
have a flux ratio of 0.61 ± 0.04 ( [Bonometti, 1985 ), in much 
better agreement with the macro models. While we cannot 
definitively identify substructure from the present analysis, 
the evidence is fascinating and warrants future study. 



5.3. Future Prospects 

Looking ahead, there are several ways in which we can 
hope to improve the measurement of Hq in Q0957. Follow- 
ing this work, the best improvements are likely to come from 
a better understanding of the stellar population and Tf606W. 
In the near term, an extension of the photometric data to both 
bluer and redder wavelengths could reduce the uncertainties 
in existing SPS models and, therefore, the uncertainties in 
Hq. Over the longer term, extensive photometry spanning 
UV/optical/infrared wavelengths, coupled with a better un- 
derstanding of SPS uncertainties (e.g., TP-AGB stars, blue 
stragglers, IMF), should boost the reliability and reduce the 
uncertainties in the SPS technique. 

Improved lensing constraints may also help. One source 
of systematic uncertainty in our models is the uncertainty in 
the measurement of the total convergence from weak lensing. 
Currently, the total convergence is measured to ~ 34% pre- 
cision. Deeper imaging would be observationally expensive 
but worthwhile, especially if coupled with improved under- 
standing of the point spread function and source redshift dis- 
tribution for the weak lensing analysis. For strong lensing, we 
have noted that there are additional faint lensed features that 
we have not used, but that might further constrain the lensing 
critical curve (see the discussion accompanying Figs. [Tl] and 
T3] ). We note for the record that using the precise position 
and flux ratio constraints for the quasars is not likely to help 
us understand the (large-scale) mass distribution or Hq any 
better; those data will ultimately be most useful for probing 
small-scale structure in the lens. 

Finally, it is interesting to consider whether stellar dynam- 
ics data could further constrain our models. Previous studies 
have shown that joint lensing+dynamics analyses can provide 
more info rmation than lensing alone about the mass dis tribu- 
tion (e.g., |Treu et al.|20Q6{|Barnabe & Koopmans|2007| ). The 
joint approach has been used to good effect by the SLACS 
team to improve constraints on quantities like the total mass 
to light ratio and t he slope of the inner density p rofile ((Koop- 
|mans et al.||2006|). For Q0957, Romanowsky & Kochanek 
(1999 ) have successfully combined information from stellar 
dynamics and lensing to tighten constraints on mass mod- 
els. Adopting the measured central velocity dispersion from 
|Tonry & Franxj ( |1997| ), (Romanowsky & Kochanek| use orbit 
modeling techniques to constrain a spherical power law pro- 
file, measuring Hq = 61t\l km s~^ Mpc~^ (2cr). While it is 
clear such an analysis would be useful to further constrain our 
models, it would require orbit modeling for each of our model 
classes, which is beyond the scope of this work. Furthermore, 
it would require a deprojection of the stellar component that 
we include in our models, which could be challenging (be- 
cause of the varying ellipticity and orientation; cf. Figure |3]) 
and would have uncertainties of its own. 
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TABLE 1 
Lens Galaxy Photometry 



Filter 


Total Counts 


mAB 


Zeropoint, AB^ 


F606W 


3328.6 


18.809 ±0.061 


27.614 


F814W 


5372.9 


17.743 ±0.065 


27.068 



^ Quoted zeropoints differ from standard values as we must 
correct for a plate scale change from 0.05'^ to 0.03'' pixels 



TABLE 2 
Modeling Constraints 



Feature Position {"^ Symbol in Figure 3 



Gl 


(0,0) ±0.00001 


- 


Quasar A 


(1.408, 5.034) ±0.03 


Green Plus 


Quasar B 


(0.182,-1.018)±0.03 


Green Plus 


Jet A5 


(1.392, 5.080) ±0.03 


- 


JetB5 


(0.164, -0.962) ±0.03 




lA 


(2.878, 3.453) ±0.05 


Red Circle 


IB 


(-1.362,-0.043) ±0.05 


Red Circle 


IIA 


(2.666, 3.634) ±0.05 


Blue Circle 


IIB 


(-1.457, -0.075) ±0.05 


Blue Circle 


IIIA 


(2.395, 3.694) ±0.05 


Cyan Circle 


IIIB 


(-1.682, -0.026) ±0.05 


Cyan Circle 


IVA 


(0.021,-2.532) ±0.03 


Green Diamond 


IVB 


(0.512, -2.386) ±0.03 


Green Diamond 


VA 


(2.111, 3.664)±0.05 


Red Diamond 


VB 


(-0.768, -2.640) ±0.05 


Red Diamond 


VIA 


(1.875, 3.488)±0.12 


Blue Diamond 


VIB 


(-1.128,-2.777) ±0.05 


Blue Diamond 


VIC 


(1.523,-1.634) ±0.05 


Blue Diamond 


VIIA 


(1.875, 3.488)±0.12 




VIIB 


(-1.065,-2.786) ±0.05 




VIIC 


(1.489,-1.688)±0.05 




VIIIA 


(2.280, 3.391) ±0.09 


Magenta Diamond 


VIIIB 


(-0.454, -2.873) ±0.09 


Magenta Diamond 


IXA 


(-2.003, -2.435) ±0.05 


Cyan Diamond 


IXB 


(1.293, 3.611)±0.05 


Cyan Diamond 


XA 


(-1.708,-1.878) ±0.08 


Yellow Circle 


XB 


(-2.181,-0.551) ±0.08 


Yellow Circle 


XIA 


(-2.070, -1.252) ±0.08 


Magenta Circle 


XIB 


(-2. 112, -1.053) ±0.08 


Magenta Circle 


XIIA 


(-2.745,-0.516) ±0.08 


Green Circle 


XIIB 


(-2.781,-0.699) ±0.08 


Green Circle 



Note. — New features are indicated with Roman numer- 
als 

^ Written as (x,_y) where x is West and y is North. 



TABLE 3 
Model Parameters 



Parameter 


Label 


Stellar mass to light ratio 


T^F606W 


Halo ellipticity 


e 


Position angle 


PA 


Shear angle 




Core radius {a models) 


a 


Scale radius (NEW models) 


rs 


a angle 




5 angle 


Os 


Halo mass normalization 


bd 


External Shear 


7 


3'"^ order term 


cr 


3'"^ order term 


5 



Note. — Power law a. models use a 
core radius a, while NEW models use a 
scale radius r^. 
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TABLE 4 
SPS Model Parameters 



Parameter 


Prior 


FSPS 


Maraston SPS 


Formation redshift, Zf 


0.361 


-1089 


/ 


/ 


SFR e-folding time, r 


0- 


oo 


/ 


/ 


Constant SFR, C 


0- 


-1 


/ 


/ 


Dust around young stars, ri 


0- 


oo 


/ 


/ 


Diffuse dust^ , T2 


P(x.e 


-1.086T2 


/ 


/ 


Fraction of blue HB stars^, fBHB 


0- 


0.5 


/ 




Specific frequency of blue stragglers^, ^bs 


0- 


10 


/ 




Shift in logiLboi) along the TP-AGB^, Al 


-0.4 


-0.4 


/ 




Shift in logiTeff) along the TP-AGB^, Ar 


-0.2 


-0.2 


/ 





^ We use a exponential pri or on the diffuse dust content of the form e ^i^606w/i 0mag 
2 See |Conroy erZli2009} for details 



TABLE 5 
Model Results: HST-ACS data 



Median model values with 68% CL (95% CL) uncertainties 

NEW a = 0.5 a = 1.0 a = 1.5 



Parameter 



bd (") 

e 

Oe n 

7(xl02) 

an 

Cr(xl03) 

Oa n 

(5(X103) 



097+^- 01 

0.46!°:]^ 

70!^ 
7.8+I-0 



2 6+1-^ 
335_2g]^ 
2.8!};8 



^-0.034^ 
/+0.30>. 
^^-0.28^ 



V-322^ 
(+3.0^ 



5.4+2-5 
31 +0-^4 

^•■^^-0.13 

72+9 
5.7+1-? 



1.9: 

210+- 



(+3.9-) 

(#) 

^-154^ 
.+3.5^1 
'^-4.0^ 



7.4!j-5 
4.4+1 

40+^ 
71+9 
7.8!3-i 

4 0+10 

-2v 

9^0+61 
^■^•'-184 
2,4+1.5 

59- 



til 

7+°if; 

r+5-2) 

/+15 



8.6+J-i 

+o:( 

-lis 



3.1+0-6 



0.44; 
72- 



I? 

lllil 



■-^0 

+17 



-+5'l 
-+k? 



7 5+4-2 
42+16 

1 8+0-5 

266!^!^ 
3 3+0-9 
62!io' 



^-0.8^ 
/+0.24>. 
'^-O.IV^ 

d:|) 

+-Po 

+15= 



51+0-07 T+crtTY" 



^5,30" 



0.55; 
0.12; 
89.4- 



0.53: 

0.15: 

75.3- 



50+^-^^ z+u.uy^) 



0.21^ 



-9-iO? 

84 4+11-1 /+18.3\ 
Q^-^-11.3 ^--17.9^ 



on Q+lU.l /+?7^B! 
-IM ^-16.0^ 



-0.04 
+10.1 



Prel 



0.38 
0.014 



0.91 
0.575 



0.90 
1.000 



0.58 
0.114 



TABLE 6 

Model Results: HST-ACS data + Maraston SPS Models 



Median model values with 68% CL (95% CL) uncertainties 
NFW a = 0.5 a = 1.0 



Parameter 



a = 1.5 



bi (") 

e 

Oe n 

7(xl02) 

an 

o-(xl03) 

Oa n 



5 7+1-2 

n 1 1 O+0.028 
^•^^^-0.018 

34+0-09 

'^•-^ -0.09 

70!^ 
8.6!3-0 



55+' 



12 



2 0+1-^ 
235+i 
2.8!j-4 



(-!•) 

/+0.107>. 



-0.025 
+0.14^ 
-0.26^ 
/+15^ 



(! ) 



+2.5 ^ 



/+80 ^) 

f+2-4) 
^-1.9^ 
/+18^) 



c 3+1.1 

6.4+1-5 

0.30; 



73+8 

7 7+2.8 

5.8!j:j 

211+39 
^^^-119 

2 5+1-1 

^•-^-0.9 

61! 



/+0.16>. 

'^-Q.ie^ 

/+13 ^ 

^-1.54^ 
/+4.K 

^^-3.4^ 
f+16) 

+-?.? 

+1l^ 

'^-162^ 

r+1-^) 



+o;% 

-+v^ 



n ^0+0.05 /+0.09\ 



6.5!|-0 
5.0; ' 

0.32 

72-^ 

52!?, 
4 0+1-0 

1 S+I-4 
-0,9 

228^36 

2.4!i:i 

60!i6 

53+0-05 
-Q-Q6 
1 6+0-02 



(-2:) 
/+0.115>. 



-+^^ 
^-23,^ 



^9+0.9 

3 5+0-§ 
0.34+11 

76!^ 
9.8!2-^ 

49+^ 

1 4+0-5 

1.9+ 



229+ 



/+O.08\ 



a+'i 

-70 

3 0+1-2 
67+10 

——^114 
n s 8+0-07 
^•^^-0.06 

0.23 



r+i-4) 

^^-1.4^ 



/+u.y\ 
'^-0.9^ 
/+0.1K 
^-0.12^ 



/+6.K 
(+15) 

^^-146^ 

M 

T+crnV 



Ib/Ia 

^5,30" 

Ho 



0.6l!i 
0.17 
82.9: 



-0.05 ^-0.09 
+6.9 ^+12.2 



0.13^ 
73.75 



■0.02 
7.6 

7^ 



-0.04 
+12.4 



-0.05 



-0.02 



^-0.02 
(+8.7 



min(x 



reduced-' 
Prel 



0.57 
0.010 



0.99 
0.814 



0.98 
1.000 



0.83 
0.053 



23 



TABLE 7 

Model Results: HST-ACS data + Hq priors 



Median model values with 68% CL (95% CL) uncertainties 
NFW a = 0.5 a = 1.0 a = 1.5 



Parameter 



bi (") 

e 

Oe n 

7(xl02) 

an 

f7(Xl03) 

Oa n 
05 n 



43+0.7 

0.135: 
0.25 



A?)28 



57+^ 



2.5+^-^ 
213+23 



2.8^}-^ 



3.6!|;-2 
0.19 



-0.04 

69!|j 

io.o!|o 

58i 



3 3+1-2 
211+10 

7Q+19 



6.3!!-^ 



D7 

10 
74+y 

7 6+2-7 
50+11 

^•^^LO 
,+Ll 
LO 



0.31 



1.9+ 



215^1 



5.1^0-2 
6.7+^-^ 
26+^-07 

74+9 
7.8^2-4 

52+^' 

C Q+l.l 

^•^-LO 

2 1+0-9 
215+30 
9 5+1-1 
62+11 



5.5^: 

+0.08 

,07 



0.28; 
72- 



52; 
4.2: 

1.9 



0+2.5 
^-2.6 
+10 



Co 



rU5~ 

-0.02 



-9-] 
,+1.4 

-Ll 

221^24 
2 3+1-1 
63+1^ 

+0^ 



5 0+0-2 

72+1 

56+^ 
-10 

4.1+0-8 

2 0+1-^ 
-1.0 
220^21 



I55~ 



2.3!} 



63+if 



5 

4.1 

0.25: 
77 



;+8'.S6 
07 

■10 



1 1 0+2-3 

51+16 
1.3+0-6 

9 9+8:9 
-0,9 

180^47 
3.8^2-^ 
69 



5.7+0-3 
4 

^•^-0.5 

oc+0.07 
^-^-0.06 
77+7 

11.0+1^ 
51+10 
1.3+0-5 



2.2" 



68+4 



0155" 

'Ml 

-0.01 



^5,30" 



0.66: 
0.17 



-0.05 



0.007 



70+0.22 

-0^07. 
17+0-03 

^•^ ^-0.04 



0.52: 
0.13 



0.53: 
0.13 



0.53: 
0.17" 



0.003 



1.000 



1.000 



-0.02 



0.55: 
0.17 



0.869 



.-0.02 



0.61 
0.24- 



85 



0.565 



-0.01 



0.006 



O.6O: 
0.24" 



0.000 



Prel 



Note. — Columns on the left use Hq = 74 .2 ±3.6 km s 1 Mpc 1 from |Riess et al.||2009) while columns on the 
right use Hq = 70.5 ± 1.3 km s'l Mpc-i from |Komatsu et al.H2008t . 



